home *** CD-ROM | disk | FTP | other *** search
-
- LhA V1.32 Application Info
-
- File structure and
- algorithms.
-
- By Stefan Boberg 1991,92
-
- NB: This is an early version of the document, so it is not complete.
-
-
- Format of a LZH / LHA file
- --------------------------
-
- LHA files have exactly the same file format and structure as LZH files,
- but LHA files generally are compressed with -lh5- compression, while LZH
- files generally are -lh1- compressed (see section about compression algo-
- rithms).
-
- Files can be stored in arbitrary order in the archive file.
-
- The overall file format is as follows:
-
- [file header]
- [file data]
- [file header]
- [file data]
- .
- .
- .
- [archive terminator]
-
-
- The file header is layout as follows:
-
- Case 1: (header level 0)
-
- Header size (in bytes) 1 byte
- Header checksum 1 byte
- Storage method 5 bytes
- Compressed size 4 bytes
- Original size 4 bytes
- Last mod file date & time 4 bytes
- File attributes 1 byte
- Header level [0] 1 byte
- Filename length 1 byte
- Filename & filenote variable size
- File CRC-16 2 bytes
-
- Case 2: (header level 1)
-
- Header size (in bytes) 1 byte
- Header checksum 1 byte
- Storage method 5 bytes
- Compressed size 4 bytes
- Original size 4 bytes
- Last mod file date & time 4 bytes
- File attributes 1 byte
- Header level [1] 1 byte
- Filename length 1 byte
- Filename & filenote variable size
- File CRC-16 2 bytes
- Host Operating System 1 byte
-
- Extension size 2 bytes
- Extension data variable size
- ...
- Extension terminator [0] 2 bytes
-
- Case 3: (header level 2)
-
- Header size (in bytes) 2 bytes
- Storage method 5 bytes
- Compressed size 4 bytes
- Original size 4 bytes
- Last mod file date & time (UNIX-Fmt) 4 bytes
- File attributes 1 byte
- Header level [1] 1 byte
- Filename length 1 byte
- Filename & filenote variable size
- File CRC-16 2 bytes
- Host Operating System 1 byte
-
- Extension size 2 bytes
- Extension data variable size
- ...
- Extension terminator [0] 2 bytes
-
-
- The compressed file data follows immediately after the last header byte.
-
- The archive terminator is a single 0 byte after the last data byte of the
- last file in the archive.
-
-
- Explanation of fields
- ---------------------
-
- All fields are encoded in Intel-format, i.e. 16-bit quantities are stored
- with the least significant byte first. 32-bit quantities are stored as two
- 16-bit Intel words with the least significant word first.
-
-
- Header size
-
- This unsigned byte contains the length of the header excluding the
- header checksum byte and the header size byte itself.
-
- With level-1 headers, the extended headers are NOT included in the
- header size count. (except for the first two-byte length word).
-
- With level-2 headers, this is a two-byte word field containing the
- length of the entire header including all extended headers.
-
- Header checksum
-
- This byte contains the modulo-256 checksum of the header, which is
- calculated as follows (pseudo-C):
-
- {
- unsigned byte header[];
- unsigned byte length;
- unsigned byte checksum;
-
- checksum = 0;
- length = header[0]; /* Header size field */
-
- while (length) {
- checksum += header[length + 2];
- length--;
- }
-
- /* checksum now contains the checksum */
- }
-
-
- Storage method
-
- This is a 5-byte ASCII char array containing the storage method ID.
- See the section about compression methods for a list of IDs.
-
-
- Compressed size
- Original size
-
- These 4-byte fields contains the size of the file in it's compressed
- and original state, respectively.
-
-
- Last file modification date & time
-
- The date and time is encoded in standard MS-DOS format. The 32-bit word
- is divided into bit fields like this:
-
- Bit 31 - 25 (Year - 1980)
- 21 - 24 Month [1..12]
- 16 - 20 Day [1..31]
- 11 - 15 Hour [0..23]
- 5 - 10 Minute [0..59]
- 0 - 4 Seconds/2 [0..29]
-
-
- With level-2 headers, things are a bit different. In this case the date
- is stored in UNIX-format. A UNIX timestamp is a 32-bit integer containing
- the number of seconds since January 1, 1970.
-
-
- File attributes
-
- This byte field contains the file attribute bits, the format depends
- on the host operating system.
-
-
- Header level
-
- This byte field is used to indicate what kind of header this is, it
- can currently be 0 (original LhArc format), 1 or 2 (Unix LHarc/LHA
- format).
-
-
- Filename length
-
- This field contains the length (in bytes) of the filename.
-
- Amiga LhArc/LhA stores filenotes in level-0 headers in the filename
- field. The filenote follows the null-terminated filename (the filename
- is not normally null-terminated). The length of the filenote and the
- null byte should be included in the filename length count. This way
- of storing the filenotes is compatible with all versions of LhArc, so
- Amiga LZH archives with filenotes can be processed on other platforms
- without problems.
-
-
- Filename & filenote
-
- This field contains the filename and (optional) filenote.
-
-
- File CRC-16
-
- This field contains the CRC-16 of the source (uncompressed) file.
- It is used to check the integrity of the archive during extract and
- test operations.
-
- CRC
- ---
-
- The CRC is a standard ANSI 16-bit CRC. It is calculated as follows:
-
- (Pseudo-C)
-
- unsigned short calcCRC(unsigned char *buffer, unsigned int length)
- {
- unsigned short crc;
- unsigned int i;
- unsigned char c;
-
- crc = 0;
- i = 0;
-
- while(i < length;) {
- c = buffer[i++];
- crc = crctable[(crc ^ (c)) & 0xFF] ^ (crc >> 8);
- }
- return(crc);
- }
-
- The CRC-table is built as follows:
-
- unsigned short crctable[256];
-
- void make_crctable(void)
- {
- unsigned int i, j, r;
-
- for (i = 0; i < 256; i++) {
- r = i;
- for (j = 0; j < 8; j++)
- if (r & 1) r = (r >> 1) ^ 0xA001;
- else r >>= 1;
- crctable[i] = r;
- }
- }
-
- Extended headers
- -----------------
-
- The `extended headers' are used in level-1 and level-2 headers to store
- optional or variably-sized information such as filenotes, operating-system
- specific attributes etc. The general structure of an extended header is:
-
- Length [2 bytes] (The length count includes the type,
- Type [1 byte] length and data fields, i.e. data
- Data [Length - 3 bytes] field length + 3 = Length)
-
- The extended-headers block is terminated by 2 zero bytes (zero length).
-
- The currently implemented headers are:
-
- Type
- ----
-
- 0 Common header (Data = Header CRC16)
-
- 1 Filename header (Data = ASCII string of Filename, excluding
- directory names)
-
- 2 Dirname header (Data = ASCII string of Directory name,
- excluding trailing slash). Node delimiter
- is 0xFF (octal 0377, decimal 255)
-
- 0x40 Attribute header (Data = Two-byte word containing file
- attributes). This overrides the attribute
- field in the main header.
-
- 0x71 Filenote header (Data = ASCII string of filenote)
-
-
- Compression modes
- -----------------
-
- Currently, a file can be stored in the archive in one of four ways; it
- can be STORED (not compressed) or FROZEN (compressed) in three different
- ways. The method ID's are listed in the table below:
-
- Method ID
- ------------------
- Stored -lh0-
- Frozen -lh1-
- Frozen -lh4-
- Frozen -lh5-
- Directory -lhd-
- ------------------
-
- I. STORED
-
- Compression:
-
- A stored file is not compressed. The file data should be copied
- directly from the source file to the archive, the CRC16 for the file
- must be calculated and stored in the header for data integrity check.
-
- Decompression:
-
- A stored file is not compressed. The file data can be copied
- directly from the archive to the destination, while calculating the
- CRC16 for the file.
-
- II. FROZEN (-lh1-)
-
- Compression:
-
- LZ77 with 4096 bytes window. Literals and copies encoded with
- dynamic order-0 Huffman codes. Distance codes encoded with fixed
- order-0 Huffman codes.
-
- [ No algorithm description in this early document version ]
-
- Decompression:
-
- [ No algorithm description in this early document version ]
-
- III. FROZEN (-lh4-)
-
- Compression:
-
- This method is exactly the same as -lh5-, but with a window size
- of 4096 characters. See the description of -lh5- for more info.
-
- Decompression:
-
- This method is exactly the same as -lh5- and can be decompressed
- with the same decompression routine, there is no difference between
- -lh5- and -lh4- from the decompressor's point of view.
-
- IV. FROZEN (-lh5-)
-
- Compression:
-
- LZ77 with 8192 bytes window. Literals and copies encoded with
- block-adaptive order-0 Huffman codes. Number of distance bits encoded
- with another set of block-adaptive Huffman codes.
-
- [ No algorithm description in this early document version ]
-
- Decompression:
-
- No buffer initialization required.
-
- [ No algorithm description in this early document version ]
-
- V. Directory (-lhd-)
-
- Compression:
-
- No Compression. Set the CRC-16 field to 0000. The directory name
- should include a trailing slash. (like in `dir1/dir2/', and not
- `dir1/dir2')
-
- Decompression:
-
- No Compression. Just create the directory whose name is in the
- filename field.
-